tsm: ensure that snapshot writes are eventually retried/released, even if WriteSnapshot fails #5689

jonseymour · 2016-02-15T09:23:11Z

This is my attempt to fix the code review issue I identified in #5686.

The controversial aspect of this fix may be the introduction of a commit lock that ensures that
at most one thread is writing a snapshot for a given shard at a given time. I think the lock
is justified because reasoning about concurrent execution of these code paths seems fraught with
danger. Of course, if there are contrary opinions, lets discuss.

jonseymour · 2016-02-15T10:03:56Z

Oops. I need to re-roll with corresponding changes to the tests.

jonseymour · 2016-02-15T13:35:17Z

On reflection, I think this fix is probably incomplete, since I don't think the write for the latest snapshot will do what is required to gather the keys from snapshots that failed to be written on a previous WriteSnapshot() call so we end up losing that data from those snapshots. I need to think about this some more, I think. Will re-open the PR when/if I find a solution.

jonseymour · 2016-02-15T22:52:05Z

Previously, if a snapshot write failed, the memory was never released and the associated writes were never retried.

Furthermore, if the write error was transient and a subsequent compact was successful, then associated WAL segments would be deleted without ever having been written into TSM files. Such a data loss would not be apparent until influxd was restarted (because the data would still be served from the cache).

Interestingly, these issues made influxd less resilient in the face of transient write errors that apparently do not require a server restart, than in the presence of permanent write errors that do. The reason is that provided the WAL segments are not deleted, the normal influxd restart process will eventually compact the WAL segments associated with failed snapshot writes. However, if the write errors are only transient, then the next successful compaction will delete the WAL segments without them ever having been written into TSM files.

This series should address these issues.

jonseymour · 2016-02-16T01:58:18Z

I have now added some tests for the rollback functionality. The new and revised tests show how the association between a cache snapshot and a slice of WAL segment names is established and how WriteSnapshots() can use this capability to retry previously failed snapshot writes.

jonseymour · 2016-02-16T07:24:46Z

Fixed a thread safety issue in the implementation - PrepareSnapshots() now returns a clone of the snapshots slice, rather than a reference. This allows the snapshotter to modify the contents of the slice without interfering with queries that may be trying to use it.

jonseymour · 2016-02-22T06:47:04Z

note: commits prior to 7059331 are from PR #5778 and will not be delivered by the final version of this PR

jonseymour · 2016-02-22T07:08:56Z

This query output shows how this fix recovered the snapshots that could not be written because of temporary removal of write permission to the shard's TSM directory. When the write permissions were restored, the snapshot count dropped to zero again.

The second screen shot is taken after restarting the server, indicating that the full contents of the cache was successfully persisted to TSM files

Log output showing compaction of resulting TSM files:

[tsm1] 2016/02/22 18:00:27 error writing snapshot from compactor: remove /Users/jonseymour/.influxdb/data/_internal/monitor/1/000000070-000000001.tsm.tmp: no such file or directory
[tsm1] 2016/02/22 18:00:27 error writing snapshot: remove /Users/jonseymour/.influxdb/data/_internal/monitor/1/000000070-000000001.tsm.tmp: no such file or directory
[tsm1] 2016/02/22 18:02:57 error writing snapshot from compactor: remove /Users/jonseymour/.influxdb/data/_internal/monitor/1/000000071-000000001.tsm.tmp: no such file or directory
[tsm1] 2016/02/22 18:02:57 error writing snapshot: remove /Users/jonseymour/.influxdb/data/_internal/monitor/1/000000071-000000001.tsm.tmp: no such file or directory
[tsm1] 2016/02/22 18:05:27 beginning level 1 compaction of group 0, 2 TSM files
[tsm1] 2016/02/22 18:05:27 compacting level 1 group (0) /Users/jonseymour/.influxdb/data/_internal/monitor/1/000000072-000000001.tsm (#0)
[tsm1] 2016/02/22 18:05:27 compacting level 1 group (0) /Users/jonseymour/.influxdb/data/_internal/monitor/1/000000073-000000001.tsm (#1)
[tsm1] 2016/02/22 18:05:27 compacted level 1 group (0) into /Users/jonseymour/.influxdb/data/_internal/monitor/1/000000073-000000002.tsm.tmp (#0)
[tsm1] 2016/02/22 18:05:27 compacted level 1 group 0 of 2 files into 1 files in 1.629107ms

jonseymour · 2016-02-22T07:12:23Z

@jwilder - example of a (manual) integration test output which shows that this fix does what it intends to do.

To run the test, do: cd tsdb/engine/tsm1 Then run: GORACE=history=1000 go test -race cache_race_test.go

The current implementation of Cache.Deduplicate is not thread safe as revealed by a code review and, seperately, a go race detector test case. The first race condition arises because entries being deduplicated by the Cache.Deduplicate method were being read by the cache logic without the deduplicating thread taking a write lock. A second race condition arose with one of the initial fixes because the snapshot map was being updated by the deduplicating thread without a readlock (on the snapshot) being taken by readers of the snapshot. This fix works by taking a copy of the snapshot's store (called dirty) while the write lock of the engine's cache is held. Only the entries that need sorting are cloned. The other entries are copied by reference. Deduplication of the slice occurs on the dirty slice, rather than the store slice. There is no thread safety issue with the dirty slice because only the snapshotting thread sees that slice. The entries that are updated by the snapshotting thread are the entries that need sorting but all these entries are clones which are not visible to any other thread. When we are done deduplicating, we update the store of the snapshot while holding the write lock of the engine's cache - not that of the snapshot itself. Once a snapshot is deduplicated, it is never modified again. In the time between the snapshot being created and the deduplication finishing, the snapshot may be modified, however, in these cases the writes are guarded by a writelock on the engine's cache and the snapshotting thread does not refer to those values. The end result is that it is no longer necessary to take any snapshot locks the lock guarding the engine's cache is sufficient. The snapshotter does need to take write locks on the engine's cache but only at two points: * while taking the initial snapshot * after deduplication is complete when updating the cache's store reference Signed-off-by: Jon Seymour <jon@wildducktheories.com>

jonseymour · 2016-02-22T23:18:04Z

Rebased to remove "comment only assertions" commit that was removed from another PR.

jwilder · 2016-02-23T06:18:27Z

@jonseymour I think supporting the multiple snapshots in the cache is not necessary and is complicating things. I have an alternative fix in #5789 that removes the multiple snapshot support.

jonseymour · 2016-02-23T15:04:12Z

Rebased onto a merge of #5701 and 7f457b8 has equivalent fixes to tip of #5795. It has one extra commit (7a03df2) from #5777 not in #5795.

Also, introduce a currently empty RollbackSnapshot method. Later, we are going to use a commit pattern where resources are first prepared and then either committed or rolled back. Currently, the code doesn't understand the concept of a rollback, but we will add it later. For now, we will simply rename the existing methods to their final names and add the RollbackSnapshot() method as a placeholder. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

…led. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

This lock ensures that at most one thread is actively attempting to commit the WAL log for a single shard at once. The assumption here it is that it is not helpful to have more than one thread attempting to do this and that reasoning about correctness is much more easily achieved if one only has to consider a single thread. The Prepare/Commit or Rollback pattern ensures that the lock will always be released. This is not to say that the TSM writing process might not be benefit from concurrency, only that the concurrency should be managed within the scope of the shard-wide commit lock. Note that the acqusition order is commit lock first, then cache mutex. This ensures the cache is available for writes while the snapshot is being processed. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

Now that there is only one thread allowed to be actively writing a snapshot at once, we can simplify the the commit logic to simply removing all active snapshots. It also means we do not need to scan for the committed snapshot since we are going to purge all of them. Note that Rollback doesn't touch the snapshot slice - it leaves the snapshot in place until a subsequent compaction succeeds at which point it will be removed. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

The previous commits ensured that snapshots that failed to write would eventually be released. However, only the current snapshot would get written to disk. This commit ensures that the writes for previously failed snapshots are retried again. To ensure we cleanup only those WAL segments that we have successfully compacted, we need to keep track of precisely which WAL segments each snapshot captures, and retry all the snapshots in the same order. We only remove WAL segments for snapshots that we successfully write and stop on the first error we encounter. When we rollback, we only rollback the snapshots that we failed to write. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

Add some tests to check that new files specified on PrepareSnapshots are associated with the latest snapshot.

The test now demonstrates that if a snapshot is rolled back, the failed snapshots (only) will appear in the slice returned by the next PrepareSnapshots call. The test demonstrates how the discovery of new WAL segment leads to the new files being associated with the latest snapshot. The test demonstrates that when retrying multiple snapshots, we can successfully commit partial progress, even if we need to rollback one of the later snapshots.

A test to check that the commit lock is working as expected. In particular, we want to check that if there is a pending commit, a prepare issued by a concurrent goroutine will block until such time as the current holder of the lock either commits or rolls back. This test also checks that concurrent writes into the cache are not prevented by goroutines that hold the cache's commit lock. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

jwilder · 2016-02-23T17:51:37Z

@jonseymour I'm closing this in favor of #5789 because it resolves #5686 by removing some unnecessary complexity in the cache. Since we don't need the multiple snapshots, it does not make sense to add additional layers on top of them that adds additional complexity to the cache.

jonseymour changed the title ~~tsm: ensure that snapshots are always released, even if WriteSnapshot fails~~ tsm: ensure that snapshots are eventually released, even if WriteSnapshot fails Feb 15, 2016

jonseymour force-pushed the js-failed-write-snapshot branch from f83195c to 194ad0f Compare February 15, 2016 09:53

jonseymour force-pushed the js-failed-write-snapshot branch from 194ad0f to 79b5c0f Compare February 15, 2016 10:10

jonseymour closed this Feb 15, 2016

jonseymour reopened this Feb 15, 2016

jonseymour closed this Feb 15, 2016

jonseymour reopened this Feb 15, 2016

jonseymour force-pushed the js-failed-write-snapshot branch 2 times, most recently from df3edd7 to 8257570 Compare February 15, 2016 22:46

jonseymour changed the title ~~tsm: ensure that snapshots are eventually released, even if WriteSnapshot fails~~ tsm: ensure that snapshot writes are eventually retried/released, even if WriteSnapshot fails Feb 15, 2016

jonseymour force-pushed the js-failed-write-snapshot branch 7 times, most recently from 953958f to b65864d Compare February 16, 2016 07:22

jonseymour mentioned this pull request Feb 16, 2016

code review: potential thread safety issue in cache.go #5699

Closed

jwilder mentioned this pull request Feb 18, 2016

Add stats and diagnostics to TSM engine #5499

Closed

jonseymour force-pushed the js-failed-write-snapshot branch from b65864d to dc541f4 Compare February 22, 2016 06:44

jonseymour mentioned this pull request Feb 22, 2016

tsm: cache: suppress spurious !snapshot cache statistics from persisted measurements #5778

Closed

tsm: cache: add a test that demonstrates the the race condition.

32fc885

To run the test, do: cd tsdb/engine/tsm1 Then run: GORACE=history=1000 go test -race cache_race_test.go

jonseymour force-pushed the js-failed-write-snapshot branch from dc541f4 to cfef88f Compare February 22, 2016 23:17

jonseymour mentioned this pull request Feb 23, 2016

Simplify cache snapshotting #5789

Merged

jonseymour force-pushed the js-failed-write-snapshot branch from cfef88f to a1673df Compare February 23, 2016 14:59

jonseymour added 8 commits February 24, 2016 02:11

tsm: ensure that either CommitSnapshot() or RollbackSnapshot() is cal…

964a0e4

…led. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

tsm: extend TestCache_CacheSnapshot

393e07e

Add some tests to check that new files specified on PrepareSnapshots are associated with the latest snapshot.

jonseymour force-pushed the js-failed-write-snapshot branch from a1673df to 80e97b4 Compare February 23, 2016 15:15

jwilder closed this Feb 23, 2016

jonseymour mentioned this pull request Feb 23, 2016

tsm: cache: race condition exists between compactor and backup #5802

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tsm: ensure that snapshot writes are eventually retried/released, even if WriteSnapshot fails #5689

tsm: ensure that snapshot writes are eventually retried/released, even if WriteSnapshot fails #5689

jonseymour commented Feb 15, 2016

jonseymour commented Feb 15, 2016

jonseymour commented Feb 15, 2016

jonseymour commented Feb 15, 2016

jonseymour commented Feb 16, 2016

jonseymour commented Feb 16, 2016

jonseymour commented Feb 22, 2016

jonseymour commented Feb 22, 2016

jonseymour commented Feb 22, 2016

jonseymour commented Feb 22, 2016

jwilder commented Feb 23, 2016

jonseymour commented Feb 23, 2016

jwilder commented Feb 23, 2016

tsm: ensure that snapshot writes are eventually retried/released, even if WriteSnapshot fails #5689

tsm: ensure that snapshot writes are eventually retried/released, even if WriteSnapshot fails #5689

Conversation

jonseymour commented Feb 15, 2016

jonseymour commented Feb 15, 2016

jonseymour commented Feb 15, 2016

jonseymour commented Feb 15, 2016

jonseymour commented Feb 16, 2016

jonseymour commented Feb 16, 2016

jonseymour commented Feb 22, 2016

jonseymour commented Feb 22, 2016

jonseymour commented Feb 22, 2016

jonseymour commented Feb 22, 2016

jwilder commented Feb 23, 2016

jonseymour commented Feb 23, 2016

jwilder commented Feb 23, 2016